The plant endoplasmic reticulum UPRome: A repository and pathway browser for genes involved in signaling networks linked to the endoplasmic reticulum

Abstract The endoplasmic reticulum (ER) houses sensors that respond to environmental stress and underly plants' adaptative responses. These sensors transduce signals that lead to changes in nuclear gene expression. The ER to nuclear signaling pathways are primarily attributed to the unfolded protein response (UPR) and are also integrated with a wide range of development, hormone, immune, and stress signaling pathways. Understanding the role of the UPR in signaling network mechanisms that associate with particular phenotypes is crucially important. While UPR‐associated genes are the subject of ongoing investigations in a few model plant systems, most remain poorly annotated, hindering the identification of candidates across plant species. This open‐source curated database provides a centralized resource of peer reviewed knowledge of ER to nuclear signaling pathways for the plant community. We provide a UPRome interactive viewer for users to navigate through the pathways and to access annotated information. The plant ER UPRome website is located at http://uprome.tamu.edu. We welcome contributions from the researchers studying the ER UPR to incorporate additional genes into the database through the “contact us” page.

. Understanding the integration of stress signals and how conditions can impair ER function is critical to understanding plant development and immunity.
Across eukaryotes, the unfolded protein response (UPR) consists of three or more major branches led by ER resident stress sensors. In mammals one branch is led by the membrane tethered transcription factor 6 (ATF6). Another branch is led by the inositol requiring enzyme 1 (IRE1, a and b isoform), which is a type 1 transmembrane protein kinase/endoribonuclease. The third branch is led by four kinases that mediate phosphorylation of eukaryotic translation initiation factor 2a (elF2a): (a) GCN2 (General Control Non-repressible 2); (b) PERK (RNA dependent Protein Kinase like ER kinase, also known as EIF2AK4); (c) HRI (Heme Regulated Inhibitor); and (d) PKR (Protein Kinase R) (Hollien, 2013;Verchot & Pajerowska-Mukhtar, 2021). In plants, the ER-resident sensors include three homologs of IRE1 named IRE1a, IRE1b, and IRE1c; two transcription factors bZIP17 and bZIP28 that resemble the mammalian ATF6; and a single GCN2 orthologue that phosphorylateselF2α (Verchot & Pajerowska-Mukhtar, 2021). The IRE1s have an endonucleolytic activity that splices an unconventional intron of the mRNA encoding the transcription factor XBP1 in mammals and bZIP60 in plants. The truncated XBP1 and bZIP60 transcription factors function to regulate ER stress-responsive genes.
Angiosperms have undergone intensive gene expansion, and polyploids have undergone more than one duplication event leading to an overall expansion of gene families (Lespinet et al., 2002). Combined with the whole genome expansion are segmental and tandem duplications and the broad need for expanded stress adaptation. This makes the number of gene family members in angiosperms larger than in Drosophila and Homo sapiens and achieves more complex gene regulatory networks. For example, the bZIP family is categorized as Group A through Group M plus Group S and the Arabidopsis groups B and K are described as ER stress-affiliated factors. Arabidopsis has four group B/K members, bZIP60, AtbZIP17, AtbZIP28, and AtbZIP49.
Until now, annotated genes involved in ER-to-nuclear signaling across angiosperms can be difficult to access. Gathering such information is essential to gain insights into molecular mechanisms for stress regulation and their contributions to useful agronomic traits. Such information can aid researchers interested in signal transduction networks either for fundamental studies, genetic engineering, or crop improvement. Published datasets of gene expression data make it possible to link environmental stimuli to genome-wide analysis studies of plant gene families, it remains possible that gene family members within the current phylogenies may contribute to new functional categories across a range of plant species because of their expansion.
We created a user-friendly database that integrates publicly available resources to meet the ongoing challenge for researchers to address the abundant and complex biological information that exists across plant species. The UPRome annotation database is a resource for comparative pathway analysis, and multi-omics datasets include manually curated pathway information built on peer-reviewed literature and datasets. To better understand ER to nuclear signaling, we created this database to provide the most current information on these signaling networks across plant species. The database contains orthologues across distant angiosperms and links to various plant genome databases.
The database includes UPR relevant factors that influence protein maturation processes, and parallel ER to nuclear signaling pathways.
The website consists of a graphic interface that provides direct links to genes and proteins across plant species. The current database includes genes from Arabidopsis, maize, potato, rice, soybean, and tomato, which represent the most studied plant models for UPR and major food crops. The interactive viewer allows users to click within the image to view external identifiers for these plant species. The external identifiers include hotlinks to online data resources such as relevant plant omics databases, UniProt, NCBI RefSeq, NCBI GENE, EnsemblPlants, and ProteomicsDB. The goal of the database is to support basic research, analysis of ER to nuclear signaling pathways, enable genome analysis and modeling, and support systems biology and education.

| Data collection
We began with a list of core ER-associated genes related to the UPR in Arabidopsis that are experimentally validated in peer-reviewed literature downloaded from PubMed and Google Scholar. We obtained the gene names, gene IDs, and DOI links from the primary literature.
Using the AmiGO 2 Gene Ontology browser, we found the total number of genes and accessions using enrichment terms for cell compart-   transcription factors (in a box labeled "Targeted Gene Regulation Factors"). The diagram presents ER-resident factors related to growth and ER stress management, fatty acid synthesis, oxidative stress responses, cold and heat tolerance, and cell death regulation which relate to various environmental challenges. Gene expansion has occurred across angiosperms from which is inferred the likelihood that factors engaged in ER to nuclear signaling have also expanded, and functions may exist in some plant species that do not occur in Arabidopsis. The front page also presents access to an interactive viewer that takes the users to a separate web page where they can explore genetic information across Arabidopsis, soybean, rice, maize, potato, and tomato. Key cellular compartments and biological processes are listed and will be updated routinely.   Figure 2).

| Data retrieval across plant genomes
F I G U R E 2 Plant UPRome workflow. The database was constructed based on the reported unfolded protein response (UPR) associated genes in model plant species esp. Arabidopsis thaliana. Homology based gene retrieval was carried out against major plant genome databases as indicated in the second panel. The retrieved genes were manually curated after careful inspection of their sequences, domains, and motifs as shown in the top third panel. We also incorporated a feed aggregator so the users can obtain up to date references related to the six plant species included in our database. We used Microsoft ASP.net, C#, Bootstrapper, and Microsoft SQL Server for the implementation of the Plant UPRome.

Solanum tuberosum 37
Zea mays 44 F I G U R E 3 Interactive viewer. (a) A screenshot of the interactive viewer. All the unfolded protein response (UPR) components are hotlinked so the users can obtain their information by clicking on each component. (b) A use case illustrating the retrieval of IRE1 information. When users click on IRE1a or IRE1b, it will open the relevant gene information pages. Users can select one of the six species by clicking on the species name logo. It will bring weblinks for major databases.

| Interactive diagram
The Plant ER UPRome database includes an interactive diagram that shows factors associated with the ER UPR (Figure 3a). The users can click on one protein in the schematic, and then several tabs will appear representing each of six plant species. The next step is to select a plant species, and then a drop-down list of databases and corresponding accessions will appear (Figure 3b). These accessions are hotlinks that take you to the associated database where users can obtain more in-depth information about the gene or protein including the gene model, locus information, sequence, subcellular location, tissue expression patterns, and protein structure. For example, by clicking on IRE1a or IRE1b in the diagram (based on Arabidopsis as the primary model reference genome), a bar appears for IRE1a/b/c. Since little is known about the subcellular location of IRE1c, it is not featured in the diagram, but readers are provided information of its exis- to information concerning ER-to-nuclear signaling networks, primarily the UPR, that occur across plant species. We expect the UPR is only a portion of the entire story of ER to nuclear signaling and that a broader repertoire of signaling may occur across plant species, tissue types, or developmental stages. The current UPRome is a high-quality curated compendium of the fundamental UPR processes in plants. We will strive to expand the compendium to include more plant species and genes as more experimentally validated data are published. The UPRome is an open-source project shares a visualization of the biological pathways and lists the current and past publications on the topic.
The Plant ER UPRome will be updated routinely by incorporating novel UPR-associated genes and upcoming literature. We will expand this database to include data from more plant genomes, especially crop genomes. We welcome contributions from the researchers studying the ER and UPR to incorporate additional genes into the database through the "contact us" page where scientists in the community can communicate new information for updating the database.
We plan to add new features for functional, structural, and comparative omics studies.

ACKNOWLEDGMENTS
This work was supported by a grant from NSF (IOS #1759034).

CONFLICT OF INTEREST
There is no conflict of interest regarding this research.